The iCloud photos from iOS or Mac OS provided a comprehensive way to tag faces. All photos in the library will be scanned, thereafter faces will be identified and tagged accordingly based on the earlier defined tags/names.
I am a photography enthusiast and I have started to take photos from year 2004. As of today, it's over 200k shots and I've kept 40k of them. There are many desktop applications can do the job of face recognition, but it's going to be super fun if I can build the solution end to end.
Many other interesting use cases can be further built/extended by this, for example, auto greeting to the person sitting in front of the computer by calling his/her name. However, this won't be covered in this project.
There are many great online blogs/projects discussed about the detailed mathematics and implementation of face recognition. For example, this one. I don't have plan to approach my problem in that way which may be too difficult and time consuming for me.
The first two questions came to my mind about this project were how to minimize the distraction factors in the photos and how to extract features understood by computers. All my photos are about something or somebody and I bet none of them is about a single face only. This project is about face recognition and all the other factors other than faces will be 'noise' and should be avoided before any machine learning kicking in. A 300 * 300 pixel color photo is quite good for human eye to distinguish faces. But it's a vector of size 300 * 300 * 3 = 270,000 which sounds too big to be machine learnt by today's computer.
After some research and experiments I decided to use OpenCV to detect/extract faces and then use Google Inception V3 to extract features from the face photos. Google Inception V3 is a very popular deep learning model (convolutional neural network) for image recognition. By extracting its pooling layer, the feature space of an image will be drastically reduced from vector size of 270,000 to 2,048.
Given the context it's face recognition for my family members and that makes this project to solve a multi-class classification problem.
In this project both accuracy and speed will be looked into when evaluating the performance of the models. A model is useless if it performs only slightly better than a random guess. Speed is another important metric especially when we deal with large scale of data.
The metric of time is quite intuitive, both the training and testing time will be considered. The accuracy will be slightly more complicated. In the classification problem, we can look at precision, recall and f-beta score. The precision_recall_fscore_support from sklearn can do all this in one go. I used to have great difficulties to distinguish precision to recall until I found a way to describe them in two sentences under the context that God is the ground truth. Precision measures how much God agrees with me when I say it's positive. Recall measures how good I can find it's positive for whatever God says it's positive. Below are the formulas for precision and recall rates.
In this project, I need to know when the model predicts label A how much of the predicted labels are truly label A, this is the precision. On the other hand, I also want to know for all instances with label A how much of them can be picked up by the model in such a way that the model predicts label A, this is recall. And f-beta is a metric to evaluate both precision and recall rates. I think both precision and recall are equally important hence the beta here will be set to 1.0 and f-beta eventually becomes f1 score with below formula.
f1 = 2 * (precision * recall) / (precision + recall)
Therefore, accuracy of the model will be based on the f1 score. For a better intuition, confusion matrix will also be used to have a better visual feeling on the accuracy.
me, wife, daughter, son, dad, mum, brother.All my photos are in D:\Pictures, majority of them are in both .jpg and .nef format. The .nef is a raw image format for Nikon cameras and .jpg is the copy after image post-processing of raw file.
The output of below code chunk shows the root directory of my photo library. For each photo its full address always follows D:\Pictures\[yyyy]\[yyyy.mm.dd] - [event name]\[yyyymmdd-hhmm][-index].jpg
import os
import time
cur_dir = os.getcwd()
#print(cur_dir)
target_image_dir = os.path.join(cur_dir, 'images')
photo_dir = 'D:\Pictures'
os.listdir(photo_dir)
A glance of a recent photo directory. Each and every file use the time stamp as its file name. .nef is the raw image file, .xmp is generated by Adobe Lightroom, both are not applicable to this project. Only .jpg files are applicable to this project.
os.listdir(os.path.join(photo_dir, '2018'))
os.listdir(os.path.join(photo_dir, '2018', '2018.01.01 - 彤彤和祺祺'))
All the photo files are in folders whose names are in year format. There are slightly more than 40k photos, below function will dump the full paths of them. 5 photos were randomly picked to show the full address.
def get_all_file_path(folder_addr):
"""Return all jpg files' full path as a list
Args:
folder_add (str): The folder address.
Returns:
all_jpg (list): A list of strings which are the full path of jpg files.
"""
all_jpg = []
for root, dirs, files in os.walk(folder_addr):
# All the target photos are in D:\Pictures\20xx. Get the jpgs from them only.
path = root.split(os.sep)
if len(path) < 3:
continue
else:
year = path[2]
if year[:2] != '20':
continue
#print((len(path) - 1) * '---', os.path.basename(root))
for file in files:
if file[-3:].lower() == 'jpg':
#print(len(path) * '---', file)
all_jpg.append(os.path.join(root, file))
return all_jpg
all_jpg = get_all_file_path(photo_dir)
import random
print('Number of jpgs:', len(all_jpg))
for i in random.sample(range(len(all_jpg)), 5):
print(all_jpg[i])
This section demonstrates how to read and display an image file.
# Import required libraries for this section
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import math
import cv2
DISPLAY = True
SAVE = True
NODISPLAY = False
NOSAVE = False
def get_numpy_from_file(file_path):
"""Return the image file as a numpy array
Args:
file_path (str): The full address of the image file.
Returns:
image (numpy.ndarray): The numpy array of the image file. Shape of (width, height, number of channels).
The file_path may contain unicode characters, cannot use cv2.imread() directly. Below way works for both
unicode and non-unicode paths.
"""
file_stream = open(file_path, 'rb')
bytes_arr = bytearray(file_stream.read())
numpy_ar = np.asarray(bytes_arr, dtype=np.uint8)
image = cv2.imdecode(numpy_ar, cv2.IMREAD_UNCHANGED)
print('Image numpy array shape:', image.shape, type(image))
return image
def display_from_numpy(image, fig_dim_x=10, fig_dim_y=10, plot_nrows=1, plot_ncols=1):
"""Display the image from a given numpy array (the returned result from get_numpy_from_file() function).
Args:
image (numpy.ndarray): The numpy array representation of an image.
image (list[numpy.ndarr]): The list of numpy arrays of images.
Returns:
inline display of the image.
"""
fig = plt.figure(figsize=(fig_dim_x, fig_dim_y))
if isinstance(image, list):
image_list = image
else:
image_list = []
image_list.append(image)
for i in range(len(image_list)):
ax = fig.add_subplot(plot_nrows, plot_ncols, i+1, xticks=[], yticks=[])
ax.set_title('Sample Image')
ax.imshow(image_list[i])
def display_from_file(file_path):
"""Display one image file inline.
Args:
file_path (str): The full address of the file.
Returns:
None: Display image inline.
"""
image = get_numpy_from_file(file_path)
# Need to convert to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
display_from_numpy(image)
file_path = os.path.join(photo_dir, '2017', '2017.11.16 - Singapore Fintech Festival', '20171116-1923.jpg')
display_from_file(file_path)
In this project, I used Haar Cascades Classifier from OpenCV for face detection.
This classifier is pre-trained to determine whether a region of interest (say size 20 *20) from the image is a face or not. So in order to find all the faces from the image, this detection needs to be repeated for all possible regions in the image. That's going to be a huge amount of computations. This classification for each region will be gone through by stages. If at an earlier stage 10 features are used to determine whether the region is a face or not, then classification will only continue if the result at this stage is positive. In this way, time won't be wasted to check the remaining features.
Then the image will be resized by a predefined scale to perform the detection one more time and this is repeated until the image is too small. This scaling factor is controlled by parameter scaleFactor. Its value should be slightly more than 1.
There is another parameter minNeighbors controlling how 'sensitive' the classifier is. The face detection is performed against different scales of the same image. A true face has a larger possibility of been detected as a 'face' at different scales for example 1.0, 0.9, 0.8, maybe not for scale 0.7 while a non-face object may be detected as 'face' at scale 1.0 only. The value of minNeighbors should be a value bigger than 3 or so.
It's difficult to have best values for the mentioned parameters. Haar Cascades Classifier is fast, but both the precision and recall rates are not perfect. Luckily I have a large raw dataset and I don't think it's a concern in this project. After few tries, I set the scaleFactor to 1.3 and minNeighbors to 5.
The input image is with size of 299 * 299 pixels, 3 channels for a color pixel, that's a space with almost 270k dimensions. It's too big to be processed directly by the machine learning algorithms. Hence, Google Inception V3 will be applied first to extract the feature vectors with dimensionality of 2048. Maybe it's still too big in this project but we'll see.
In this project I'm extracting the avg_pool layer whose output shape is (, 2048).
Four models will be evaluated. SGD Classifier, K-nearest Neighbour, Logistic Regression and Deep Learning models. The winning one will be gone through the process of grid search cross validation to find the best hyper-parameters and hopefully it can push the accuracy even further.
SGD Classifier.
Both SGD Classifier and Deep Learning using Stochastic Gradient Descent optimizer work quite similar as the 'SGD' way. When one or a batch of samples feeding into the model, a loss (the distance between output and expected output) will be computed. Then this loss value will be used to update the parameters/weights of the model to let it lean towards the expected output.
The SGD classifier prefers data been preprocessed to be normalized with zero mean and unit variance. In this report, the preprocessed data will be normalized from [0, 255] to [0, 1]. Not exactly as what's favored by SGD Classifier, but I standardize it to [0, 1] for all the models here. Default hyper parameters will be used.
K-nearest Neighbour.
Instead of constructing a generalized model, KNN is storing all the training data. When performing the prediction, distance (eg, euclidean distance) from the testing data point to all the training data point will computed. Then based on the defined K value, using majority vote to decide the class of the testing data point. After a few tries, I set K=7.
KNN is simple but it can be very time consuming during testing as distance to all training data are needed. In addition, due to the curse of high dimensionality, it may not be effective always.
Logistic Regression.
Despite the 'regression' word in the name, it's indeed a classification model. Logistic regression takes the features into a logit function with output value in the range of [0, 1], which quantifies the probability of correctly predicting the class. When all the training points fed into the model, we get the average of the probability and that's the likelihood the training data is correctly predicted. So the objective in logistic regression training is to maximize this likelihood. There are many different algorithms, e.g. Newton-Raphson, Iteratively re-weighted least squares and etcs.
Logistic regression is widely used industrial widely. But it doesn't perform well when the feature space is large and I have no idea whether it'll do a great job in this project. Default hyper parameters will be used.
Deep Learning.
Without a doubt, deep learning has gained great exposure and evolvement over the past few years, especially in the domains of computer vision, natural language processing. In this project, the feature extractor is taking almost 95% of the Google Inception V3 layers. Instead of taking the remaining 5% layers, which will do a job to classify objects into 1000 classes, I will add another few dense layers to do the customization to suit my purpose of classifying objects into 7 classes. Essentially, this is just a transfer learning approach.
Deep learning takes a lot of computation resources and time to do the gradient descent to find the best weights of the neurons. And in this project, I don't need to worry about the convolutional layers except the last few dense layers added by me.
Despite I have a large photo library, there is still a concern on the quantity of the images with good quality. If each class only has an average of 5 images, probably I won't have any result as expected. So if it's needed, techniques like using image generator will be explored in this project.
When I realized I was trying to solve a multiclass classification problem, the first intuition came to my mind is whether the data is linear separable. Therefore, the performance of a linear classifier will be set as the base of the benchmark model. Usually a linear classifier is fast and easy to train. If there is another more complicated model takes more time to train or test, it ought to perform better on accuracy (F1 score) to compensate the larger cost on time. If not, I’d rather choose the simple linear model.
There are many supporting functions and process_faces() is the one with many flags to represent whether need to display or save the given image file.
# pathlib available from python 3.5
from pathlib import Path
def get_faces(image, scaleFactor, minNeighb):
"""Perform face detection and return the detected faces as a list of (x,y,w,h).
Args:
image (numpy.ndarray): The numpay array of an image.
scaleFactor (float): The scaling factor to be used by the detectMultiScale() function.
minNeighb (int): The number of minimum neighbors to be used by the detectMultiScale() function.
Returns:
faces (list of tuples): The list of the face locations.
"""
# Convert to RGB then to grayscale
image = np.copy(image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
# Extract the pre-trained face detector from an xml file
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
# Detect the faces in image
faces = face_cascade.detectMultiScale(gray, scaleFactor, minNeighb)
return faces
def draw_bounding_box(image, faces):
"""Draw the bounding box of faces on the image.
Args:
image (np.ndarray): Numpy array of the image.
faces (list of tuples): The list of the face locations.
Return:
image_with_detections (np.ndarray): A image with bounding box on faces, in numpy array format,
after converting to RGB.
image_faces (list[np.ndarray]): List of face images.
"""
# Use np.copy() to create duplicate images to avoid aleration of the original image.
image_copy = np.copy(image)
image_with_detections = np.copy(image)
image_copy = cv2.cvtColor(image_copy, cv2.COLOR_BGR2RGB)
image_with_detections = cv2.cvtColor(image_with_detections, cv2.COLOR_BGR2RGB)
# The list of detected faces
image_faces = []
# Get the bounding box for each detected face
for (x,y,w,h) in faces:
# Add a red bounding box to the detections image
if w > 200:
line_width = w//20
else:
line_width = 3
image_faces.append(image_copy[y:(y+h), x:(x+w)])
cv2.rectangle(image_with_detections, (x,y), (x+w,y+h), (255,0,0), line_width)
return image_with_detections, image_faces
def create_get_target_dir_file(file_path):
"""Create the target directory if it's not existent, return the target directory as string
Args:
None: Global variables.
Returns:
target_dir (str): The address of the target directory.
"""
# Create the full path of the target images by replacing the photo_dir string into target_image_dir string.
target_file = file_path.replace(photo_dir, target_image_dir)
target_dir = os.path.dirname(target_file)
target_path = Path(target_dir)
# Create parents of directory, don't raise exception if the directory exists
target_path.mkdir(parents=True, exist_ok=True)
return target_dir, target_file
def save_faces(file_path, image_faces):
"""Save each face image into new files in target_iamge_dir.
Args:
file_path (str): The full path of the original photo.
image_faces (list[np.ndarray]): The list of face images, in numpy array format.
Returns:
None
"""
if len(image_faces) == 0:
return
target_dir, target_file = create_get_target_dir_file(file_path)
# Resize and save each face image.
for i, face in enumerate(image_faces):
face = cv2.resize(face, (299, 299))
os.chdir(target_dir)
file_name = os.path.basename(target_file)
cv2.imwrite(file_name + '-face-' + str(i) + '.jpg', cv2.cvtColor(face, cv2.COLOR_BGR2RGB))
print(os.path.join(target_dir, file_name + '-face-' + str(i) + '.jpg'), 'saved.')
def process_faces(file_path, display=NODISPLAY, save=NOSAVE, scaleFactor=1.3, minNeighb=5):
"""Process the input image file by extracting face(s). Display and save based on the flags.
Args:
file_path (str): The full path of the input image file.
display (bool): Default is NODISPLAY/False.
save (bool): Default is NOSAVE/False.
scaleFactor (float): The scaling factor used by face detection function.
minNeighb (int): The number of minimum neighbors.
Returns:
None: Perform display or save actions based on the flags.
"""
print('Image path', file_path)
image = get_numpy_from_file(file_path)
faces = get_faces(image, scaleFactor, minNeighb)
print('Number of faces detected:', len(faces))
image_with_detections, image_faces = draw_bounding_box(image, faces)
if save:
save_faces(file_path, image_faces)
if display:
# Display the image with the detections
display_from_numpy(image_with_detections)
# Return to this project's current working directory
os.chdir(cur_dir)
# Load in color image for face detection
file_path = os.path.join(photo_dir, '2017\\2017.11.16 - Singapore Fintech Festival', '20171116-1923.jpg')
process_faces(file_path, DISPLAY, SAVE)
Read and display the extracted 5 faces from the above image.
def get_file_path_from_folder(folder_addr):
"""Return all jpg files' full paths as a list.
Args:
folder_add (str): The folder address.
Returns:
all_jpg (list): A list of strings which are the full path of jpg files.
"""
all_jpg = []
for root, dirs, files in os.walk(folder_addr):
for file in files:
if file[-3:].lower() == 'jpg':
#print(len(path) * '---', file)
all_jpg.append(os.path.join(root, file))
return all_jpg
def get_numpy_from_folder(folder_addr):
"""Return all jpg files in a folder as numpy arrays.
Args:
folder_addr (str): The folder address.
Returns:
image_numpys (list[numpy.ndarrays]): The list of numpy arrays of jpg images in a folder.
"""
all_jpg_addr = get_file_path_from_folder(folder_addr)
image_numpys = []
for jpg_addr in all_jpg_addr:
image = get_numpy_from_file(jpg_addr)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image_numpys.append(image)
return image_numpys
image_numpys = get_numpy_from_folder(os.path.join(cur_dir, 'images\\2017\\2017.11.16 - Singapore Fintech Festival'))
display_from_numpy(image_numpys, fig_dim_x=15, fig_dim_y=5, plot_nrows=1, plot_ncols=5)
In the earlier section, all the 40k image full pathes were store in all_jpg variable. So just need to be patient to wait for the results, it'll take quite several hours to finish.
#############################
### RUN WITH CAUSION#########
#############################
# Scan through all 40k photos and extract faces
start = time.time()
for i in range(len(all_jpg)):
process_faces(all_jpg[i], NODISPLAY, SAVE)
end = time.time()
def get_count_files(folder_addr):
n = 0
for root, dirs, files in os.walk(folder_addr):
for file in files:
n = n + 1
return n
print('Extracting faces took %s hours' % round((end-start)/3600, 1))
print(get_count_files('images'), 'face images extracted')
About 13 hours has been taken and around 100k face images were generated. Below sample images are from one folder. It's quite clear that the openCV is not 100% reliable, some of the photos are not face images.
image_numpys = get_numpy_from_folder(os.path.join('images', '2017', '2017.10.25 - 祺祺吃饭'))
display_from_numpy(image_numpys, fig_dim_x=15, fig_dim_y=10, plot_nrows=3, plot_ncols=6)
There are almost 100k faces extracted. Most of them are ir-relevant, and many of them are objects other than faces. I hand-picked about 500 of the face images and saved them into 7 categories/folders.
categories = os.listdir('./images')
face_jpgs = get_file_path_from_folder('./images/')
print('Categories:', categories)
print('Total number of images:', len(face_jpgs))
Below are some high level statistics about how many images in each category.
def count_each_category(categories, files):
"""Count the number of file for each category.
Args:
categories (list): A list of categories.
files (list[str]): A list of strings which are the relative address of face images.
Returns:
stats_dict (dict): A dictionary of (category: number).
"""
stats_dict = {}
# Initilize the dictionary.
for category in categories:
stats_dict[category] = 0
# Increment the value of the matching item.
for file in files:
# Convert the path string into Path object.
file_path = Path(file)
# str(file_path.parent) will return 'images\Brother', need to us os.sep as the delimiter
# for cross platform use.
file_category = str(file_path.parent).split(os.sep)[1]
stats_dict[file_category] += 1
return stats_dict
stats_dict = count_each_category(categories, face_jpgs)
print(stats_dict)
Below bar plot shows the distribution of samples for each category. 'Daughter' category has the most images, 92. 'Son' has the second most, 73, so on and so forth. The distribution of samples are not well balanced. This observation will lead to few special 'treatment' in the later sections.
import matplotlib.pyplot as plt
%matplotlib inline
plt.bar(stats_dict.keys(), stats_dict.values())
plt.xlabel('Categories')
plt.ylabel('Number of instances')
plt.title('Distribution of instances for each category')
plt.show()
# Function to load the images as a list of file names and one hot code categories
from sklearn.datasets import load_files
from sklearn.model_selection import train_test_split
from keras.utils import np_utils
from glob import glob
# Read all the files and return 2 numpy arrays.
# One is the address of the files and the other one is the one hot encode of the category.
def load_dataset(folder_addr):
"""Load all the files in the given directory. The name of each subdirectory will be the category name.
Args:
folder_addr (str): The folder address in which there are subfolders.
Returns:
face_files (list[str]): A list of face file address strings.
face_targets (numpy.ndarray): Numpy array of categories, value from 0 to 6, without one-hot encoding.
"""
data = load_files(folder_addr)
face_files = np.array(data['filenames'])
# face_targets = np_utils.to_categorical(np.array(data['target']), 7)
face_targets = np.array(data['target'])
# Many of the times Windows and Mac are producing 'desktop.ini' or '.DS_Store' file automatically.
# Need to omit them.
invalid_files_idx = []
for i in range(len(face_files)):
if 'desktop.ini' in face_files[i] or 'DS_Store' in face_files[i]:
invalid_files_idx.append(i)
face_files = np.delete(face_files, invalid_files_idx)
face_targets = np.delete(face_targets, invalid_files_idx)
return face_files, face_targets
# Load the list of images and categories
faces, targets = load_dataset('./images')
face_names = [item[9:-1] for item in glob('./images/*/')]
print('There are %d face categories.' % len(face_names))
print(face_names)
print('There are %d total faces.' % len(faces))
print(count_each_category(categories, faces))
A random check on the file name and category name.
def parse_image_category(file_addr, category_code, is_one_hot=False):
"""Given the category in index or one-hot code, return the index and name of the category of an image.
Args:
file_addr (str): The address of the image whose name contains the category name.
category_code (int or numpy.nparray): Integer index of the category or numpy array after one-hot encoding.
is_one_hot (bool): Default it's False, set to True if the passed in category_code is in one-hot format.
Returns:
category_index (int): The index of the category, from 0 to 6.
face_names[category_index] (str): The name of the category.
"""
# Print file path and category in one hot code
print(file_addr, category_code)
if is_one_hot:
category_index = np.argmax(category_code)
else:
category_index = category_code
return category_index, face_names[category_index]
parse_image_category(faces[100], targets[100])
From this point, the data set is finally ready for the coming machine learning piplines.
from keras.applications.inception_v3 import InceptionV3
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input
from keras import backend as K
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
from keras.callbacks import ModelCheckpoint, EarlyStopping, LambdaCallback, ReduceLROnPlateau
from keras.models import load_model
from keras.preprocessing import image
def path_to_tensor(img_path):
"""Given one image path, return it as a numpy array.
Args:
img_path (str): The full path string of a image.
Returns:
np.expand_dims(x, axis=0) (numpy.ndarray): A 4D numpy array of a image after expanding from 3D to 4D.
"""
img = image.load_img(img_path, target_size=(299,299))
x = image.img_to_array(img)
return np.expand_dims(x, axis=0)
def paths_to_tensor(img_paths):
"""Given the image paths, return them as a vertically stacked numpy array.
Args:
img_paths (list[str]): The full paths of the images in a list.
Returns:
np.vstack(list_of_tensors) (numpy.ndarray): 4D numpy array of all the images, after normalization.
"""
list_of_tensors = [path_to_tensor(img_path) for img_path in img_paths if 'desktop.ini' not in img_path]
return np.vstack(list_of_tensors).astype('float32')/255
# Load the inception v3 model, include the dense layers
base_model = InceptionV3(weights='imagenet', include_top=True)
vector_out = base_model.get_layer('avg_pool')
feature_model = Model(inputs=base_model.input, outputs=vector_out.output)
def get_features(feature_model, tensors):
"""Using the given model to convert a tensor/image to a feature vector
Args:
feature_model (Keras Model): In this project, it's the Google Inception V3 model without the last dense layer.
tensors (numpy.ndarray): Group of images in a 4D numpy array.
Returns:
feature_outputs (numpy.ndarray): Feature vectors of the group of images, dimension of (x, 2018)
"""
tensor_inputs = np.expand_dims(tensors, axis=0)
feature_outputs = feature_model.predict(tensors)
return feature_outputs
Split the dataset into train, validate and test datasets. The data are not well balanced based on the earlier bar plot diagram. Hence, stratified split function will be used here.
train_faces, test_faces, train_targets, test_targets = train_test_split(faces, targets, test_size=0.15, random_state=1, stratify=targets)
train_faces, validate_faces, train_targets, validate_targets = train_test_split(train_faces, train_targets, test_size=0.2, random_state=1, stratify=train_targets)
print('There are %d training faces.' % len(train_faces))
print(count_each_category(categories, train_faces))
print('There are %d validate faces.' % len(validate_faces))
print(count_each_category(categories, validate_faces))
print('There are %d test faces.' % len(test_faces))
print(count_each_category(categories, test_faces))
# Read the images as numpy arrays
train_tensors = paths_to_tensor(train_faces)
test_tensors = paths_to_tensor(test_faces)
validate_tensors = paths_to_tensor(validate_faces)
print("Train tensor shape.", train_tensors.shape)
print('Test tensor shape.', test_tensors.shape)
print('Validate tensor shape.', validate_tensors.shape)
train_features = get_features(feature_model, train_tensors)
validate_features = get_features(feature_model, validate_tensors)
test_features = get_features(feature_model, test_tensors)
print('Train features shape:', train_features.shape, '\nValidate feature_modelres shape: ', validate_features.shape,
'\nTest features shape:', test_features.shape)
'Traditional' machine learning models of SGDClassifier, LogistricRegression, KNeighborsClassifier will be explored first. Then followed by deep neural network.
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
clf_sgd = SGDClassifier(random_state=0)
clf_knn = KNeighborsClassifier(n_neighbors=7)
clf_log = LogisticRegression(random_state=0)
%time model_sgd = clf_sgd.fit(train_features, train_targets)
%time model_knn = clf_knn.fit(train_features, train_targets)
%time model_log = clf_log.fit(train_features, train_targets)
The default parameters of the 3 models are as following. Most of the hyper-parameters are the default ones except the random_state for SGD and Logistic regression, n_neighbors from KNN.
model_sgd.get_params
model_knn.get_params
model_log.get_params
# start = time.time()
%time predict_test_sgd = model_sgd.predict(test_features)
%time predict_test_knn = model_knn.predict(test_features)
%time predict_test_log = model_log.predict(test_features)
# end = time.time()
# print('%.2gs' %(end - start))
Due to the imbalanced data as mentioned earlier, the average of precision, recall and fbeta score will be set as weighted.
from sklearn.metrics import fbeta_score, precision_recall_fscore_support, confusion_matrix
import pandas as pd
def get_score_numpy(test_targets, predict_targets, average = 'weighted'):
score = precision_recall_fscore_support(test_targets, predict_targets, average=average)
score = np.array(score)
score = score[score != None]
return score
score_sgd = get_score_numpy(test_targets, predict_test_sgd)
score_knn = get_score_numpy(test_targets, predict_test_knn)
score_log = get_score_numpy(test_targets, predict_test_log)
data = pd.DataFrame(np.stack((score_sgd, score_knn, score_log)),
columns=['Precision', 'Recall', 'F1'], index=['SGD', 'KNN', 'Log'])
data
Based on above table, logisitic regression performed the best based on every angel of precision, recall and f1 scores. It is fast on testing but not training.
Below confusion matrix plots provide more intuitive ways of how good or bad the logistic regression model is doing for each label. The first plot is based on the absolute number of samples. The second plot is after normalization, which shows a more accurate estimation.
import itertools
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
# print("Normalized confusion matrix")
else:
pass
# print('Confusion matrix, without normalization')
# print(cm)
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
cnf_matrix = confusion_matrix(test_targets, predict_test_log)
plt.figure(figsize=(6,6))
plot_confusion_matrix(cnf_matrix, classes=face_names, title='Confusion matrix, without normalization')
plt.figure(figsize=(6,6))
plot_confusion_matrix(cnf_matrix, classes=face_names, normalize=True, title='Normalized confusion matrix')
plt.show()
So far all the chosen models performed training and prediction in fractions of a second. The best accuracy is about 80% from logistic regression while KNN only got the lowest 68%, even though it's much better than random guess of 1/7 = 16.7%. Below is an exploration on using DNN model. After a few tries I decided to have 3 dense layers as it provides a balance between accuracy and time.
Inp = Input(shape=(2048,))
x = Dense(300, activation='relu')(Inp)
x = Dense(50, activation='relu')(x)
output = Dense(7, activation='softmax')(x)
model_dnn = Model(inputs=Inp, outputs=output)
model_dnn.summary()
As a classification problem, deep learning models will require the output in one-hot format. If not, the class 1 and 2 will be misunderstood as ordinal instead of nominal.
train_targets_one_hot = np_utils.to_categorical(train_targets)
validate_targets_one_hot = np_utils.to_categorical(validate_targets)
test_targets_one_hot = np_utils.to_categorical(test_targets)
model_dnn.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model_dnn.save_weights('init_weights.h5')
checkpointer = ModelCheckpoint(filepath='model_dnn.h5', verbose=1, save_best_only=True)
%%time
model_dnn.load_weights('init_weights.h5')
hist_1 = model_dnn.fit(train_features, train_targets_one_hot,
validation_data=(validate_features, validate_targets_one_hot),
epochs=50, verbose=1, batch_size=20,
callbacks=[checkpointer])
## TODO: Visualize the training and validation loss of your neural network
import matplotlib.pyplot as plt
def plt_hist(hist):
print(hist.history.keys())
# summarize history for accuracy
plt.plot(hist.history['acc'])
plt.plot(hist.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
plt_hist(hist_1)
Epoch of 50 looks like a reasonable number of iterations. The validation accuracy is stable and the loss is starting to grow, that's a sign of overfitting. Techniques of reducing learning rate, early stopping could be further explored, but not covered in this project.
Below is the metrics of the DNN model.
model_dnn = load_model('./model_dnn.h5')
%time predict_test_dnn = model_dnn.predict(test_features)
predict_test_dnn = predict_test_dnn.argmax(axis=-1)
print('Precision, Recall, F1')
print(get_score_numpy(test_targets, predict_test_dnn))
cnf_matrix = confusion_matrix(test_targets, predict_test_dnn)
plt.figure(figsize=(6,6))
plot_confusion_matrix(cnf_matrix, classes=face_names, title='Confusion matrix, without normalization')
plt.figure(figsize=(6,6))
plot_confusion_matrix(cnf_matrix, classes=face_names, normalize=True, title='Normalized confusion matrix')
plt.show()
Based on the current split of the dataset, logistic regression performed the best. KNN reached merely 70% while SGD classifier, logistic regression and DNN reached accuracy of 75% to 80%. DNN took significant longer time on training, as well as prediction.
But what if it's just a coincidence, what if I had more images? The next section will look into the refinement of the models from two ways.
The first one is to use image generator to have a bigger dataset. The other one will focus on using cross validation to determine the best model.
Use image generator to create more images. This is a very useful technique when having little data.
In this project, I put the new images in folder images2 and 10 new images will be generated for each original image. Generating new images is basically to do some distortion on the images. I think it's reasonable to narrow, widen, rotate, zoom and flip the images a little, in a reasonable range. Hence, values of the most parameters I set to 20%.
Before the image generator I had only 419 image for both training and testing. And now I have 10 times more images. For each round of training I'll have more than 4,000 images.
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import shutil
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
def get_target_dir(face_file):
""" Given the address of the original image, create directory for the new generated images if needed.
Also return the new directory address.
Args:
face_file (str): The address of the original face file, may or may not be in full address.
Returns:
target_dir (str): The address of the directory for the generated images.
"""
target_file = face_file.replace('images', 'images2')
target_dir = os.path.dirname(target_file)
target_path = Path(target_dir)
# Create parents of directory, don't raise exception if the directory exists
target_path.mkdir(parents=True, exist_ok=True)
return target_dir
def generate_images(faces):
""" Given the address of face images, use image generator to generate new images.
Args:
faces ([str]): The list of face images addresses.
Returns:
None: Generate new images and return None.
"""
if os.path.exists('images2'):
shutil.rmtree('images2', ignore_errors=True)
# sleep for 2 seconds to allow OS finish the previous deletion action.
time.sleep(2)
for n, face in enumerate(faces):
if n % 100 == 0:
print('Generating images, ' + str(n) + ' of ' + str(len(faces)) + ' faces')
target_dir = get_target_dir(face)
img = load_img(face)
x = img_to_array(img) # this is a Numpy array with shape (3, 299, 299)
x = x.reshape((1,) + x.shape) # this is a Numpy array with shape (1, 3, 299, 299)
# Copy the current face image (before augumentation) to target directory.
shutil.copy2(face, target_dir)
# the .flow() command below generates batches of randomly transformed images
# and saves the results to the target directory
i = 0
for batch in datagen.flow(x, batch_size=1, save_to_dir=target_dir, save_prefix='gen', save_format='jpg'):
i += 1
if i > 10:
break # otherwise the generator would loop indefinitely
image_numpys = get_numpy_from_folder('sample_generated')
display_from_numpy(image_numpys, fig_dim_x=15, fig_dim_y=5, plot_nrows=1, plot_ncols=5)
For a more 'stable' performance of each model, K-fold CV will be used in this project. So that each and every image will have the chance to be used for both training and testing. In this way, we will have a more 'averaged' and thus more 'stable' performance evaluation to show the robustness of the model.
from sklearn.model_selection import StratifiedKFold
n_splits = 10
skf = StratifiedKFold(n_splits=n_splits, random_state=0)
skf.get_n_splits(faces, targets)
def perform_training(clf, features, targets):
''' Perform training, return the trained model and time taken in seconds.
Args:
clf (sklearn-model): The sklearn model.
features (numpy.array): Training input, 4D numpy array.
targets (numpy.array): Training output, 1D numpy array.
Returns:
model (sklean-model): The trained model.
end - start: Time taken, in seconds.
'''
start = time.time()
model = clf.fit(features, targets)
end = time.time()
return model, round(end - start, 3)
def perform_testing(model, features):
''' Perform testing, return the accuracy and time taken in seconds.
Args:
model (sklearn-model): The sklearn model.
features (numpy.array): Testing input, 4D numpy array.
Returns:
prediction ([int]): The list of predicted labels.
end - start: Time taken, in seconds.
'''
start = time.time()
predictions = model.predict(features)
end = time.time()
return predictions, round(end - start, 3)
def perform_dnn_training(model, features, targets):
''' Perform training, return the trained model and time taken in seconds.
Args:
clf (sklearn-model): The sklearn model.
features (numpy.array): Training input, 4D numpy array.
targets (numpy.array): Training output, 1D numpy array.
Returns:
model (sklean-model): The trained model.
end - start: Time taken, in seconds.
'''
targets = np_utils.to_categorical(targets)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
start = time.time()
model.fit(features, targets, epochs=50, verbose=0, batch_size=20)
end = time.time()
return model, round(end - start, 3)
def perform_dnn_testing(model, features):
''' Perform testing, return the accuracy and time taken in seconds.
Args:
model (keras.model): The keras model
features (numpy.array): Testing input, 4D numpy array.
Returns:
predictions ([int]): The list of predicted labels.
end - start: Time taken, in seconds.
'''
start = time.time()
probas = model.predict(features)
predictions = probas.argmax(axis=-1)
end = time.time()
return predictions, round(end - start, 3)
all_test_targets = []
all_predict_targets = []
all_train_times = []
all_test_times = []
n = 0
for train_index, test_index in skf.split(faces, targets):
n = n + 1
print('Round ' + str(n) + ' of ' + str(n_splits))
x_train, x_test = faces[train_index], faces[test_index]
y_train, y_test = targets[train_index], targets[test_index]
generate_images(x_train)
# After generating the images, reaload the training dataset, testing dataset remains the same
x_train, y_train = load_dataset('./images2')
face_names = [item[10:-1] for item in glob('./images2/*/')]
x_train_tensors = paths_to_tensor(x_train)
x_test_tensors = paths_to_tensor(x_test)
x_train_features = get_features(feature_model, x_train_tensors)
x_test_features = get_features(feature_model, x_test_tensors)
# Define and initialize models.
clf_sgd = SGDClassifier(random_state=0)
clf_knn = KNeighborsClassifier(n_neighbors=7)
clf_log = LogisticRegression(random_state=0)
predict_targets = []
train_times = []
test_times = []
for clf in [clf_sgd, clf_knn, clf_log]:
model, train_time = perform_training(clf, x_train_features, y_train)
cur_predict_targets, test_time = perform_testing(model, x_test_features)
all_predict_targets.append(cur_predict_targets)
all_test_targets.append(y_test)
train_times.append(train_time)
test_times.append(test_time)
# DNN model must use this way to 're-initialize' the weights. The 'init_weights.h5' was
# created in the earlier section for the first try of DNN.
model_dnn.load_weights('init_weights.h5')
model_dnn, train_time = perform_dnn_training(model_dnn, x_train_features, y_train)
cur_predict_targets, test_time = perform_dnn_testing(model_dnn, x_test_features)
all_predict_targets.append(cur_predict_targets)
all_test_targets.append(y_test)
train_times.append(train_time)
test_times.append(test_time)
all_train_times.append(train_times)
all_test_times.append(test_times)
In this section, results from the previous refinement section will be analysed, and the best model will be determined.
Convert the predicted targets and test targets into a list with size 4.
print('The length of all_predict_targets:', len(all_predict_targets))
print('Shape of the element in all_predict_targets:', all_predict_targets[0].shape)
import pickle
pickle.dump(all_predict_targets, open('all_predict_targets', 'wb'))
pickle.dump(all_test_targets, open('all_test_targets', 'wb'))
pickle.dump(all_train_times, open('all_train_times', 'wb'))
pickle.dump(all_test_times, open('all_test_times', 'wb'))
flat_predict_targets = all_predict_targets.copy()
flat_test_targets = all_test_targets.copy()
for i in range(4, len(flat_predict_targets)):
r = i % 4
flat_predict_targets[r] = np.concatenate((flat_predict_targets[r], flat_predict_targets[i]))
flat_test_targets[r] = np.concatenate((flat_test_targets[r], flat_test_targets[i]))
flat_predict_targets = flat_predict_targets[:4]
flat_test_targets = flat_test_targets[:4]
score_sgd = get_score_numpy(flat_test_targets[0], flat_predict_targets[0])
score_knn = get_score_numpy(flat_test_targets[1], flat_predict_targets[1])
score_log = get_score_numpy(flat_test_targets[2], flat_predict_targets[2])
score_dnn = get_score_numpy(flat_test_targets[3], flat_predict_targets[3])
cnf_matrix_sgd = confusion_matrix(flat_test_targets[0], flat_predict_targets[0])
cnf_matrix_knn = confusion_matrix(flat_test_targets[1], flat_predict_targets[1])
cnf_matrix_log = confusion_matrix(flat_test_targets[2], flat_predict_targets[2])
cnf_matrix_dnn = confusion_matrix(flat_test_targets[3], flat_predict_targets[3])
Logistic Regression returned the highest 0.83 of F1 score. And below is the confusion matrix plots (after normalization) of the 4 models. From them we can clearly see the precision and recall rates.
plt.figure(figsize=(6,6))
plot_confusion_matrix(cnf_matrix_sgd, classes=face_names, normalize=True, title='SGD - Normalized confusion matrix')
plt.figure(figsize=(6,6))
plot_confusion_matrix(cnf_matrix_knn, classes=face_names, normalize=True, title='KNN - Normalized confusion matrix')
plt.figure(figsize=(6,6))
plot_confusion_matrix(cnf_matrix_log, classes=face_names, normalize=True, title='Log - Normalized confusion matrix')
plt.figure(figsize=(6,6))
plot_confusion_matrix(cnf_matrix_dnn, classes=face_names, normalize=True, title='DNN - Normalized confusion matrix')
plt.show()
All the models seem to have slight difficulty to distinguish between my son and daughter.
fig = plt.figure(figsize=(16, 4))
ax = fig.add_subplot(1, 2, 1)
ax.boxplot(np_train_times)
ax.set_title('Training Time (seconds)')
ax.set_xticklabels(['SGD', 'KNN', 'Log', 'DNN'])
ax = fig.add_subplot(1, 2, 2)
ax.boxplot(np_test_times)
ax.set_title('Testing Time (seconds)')
ax.set_xticklabels(['SGD', 'KNN', 'Log', 'DNN'])
plt.show()
SGD Classifier is the fastest on both training and testing. KNN is fast on training but it took the longest time for testing because it needs to compute the distance to all the training points during testing. Logistic Regression takes a long time on training but the testing is super fast and indeed it's fasting on testing. DNN took the longest time for training and its testing time is the 2nd longest.
score = pd.DataFrame(np.stack((score_sgd, score_knn, score_log, score_dnn)),
columns=['Precision', 'Recall', 'F1'], index=['SGD', 'KNN', 'Log', 'DNN'])
score
np_train_times = np.vstack(all_train_times)
np_test_times = np.vstack(all_test_times)
time = pd.DataFrame(np.stack((np.mean(np_train_times, axis=0), np.mean(np_test_times, axis=0))),
columns=['SGD', 'KNN', 'Log', 'DNN'], index=['Train (seconds)', 'Test (seconds)']).T
print('Average time for training and testing.')
time
The linear model (SGDClassifier) is doing quite ok in the first try and refinement section. It is almost the fastest one on both training and testing. And its performance is the baseline of this project's benchmark.
KNN doesn't perform well in terms of accuracy (F1 score). It also takes a significant longer time on testing.
DNN produced the second highest F1 score. But it took significant longer time on both training and testing.
Logistic regression did the best job on accuracy (F1 score) but took a longer training time. And in my opinion, this is the model has the best balance between accuracy and speed.
Based on the earlier refinement and benchmark section, logistic regression was determined as the best model. But its parameters are based on the default ones. Maybe accuracy or speed can be further pushed by tuning the hyper-parameters.
model_log.get_params
# Load the list of images and categories
faces, targets = load_dataset('./images')
face_names = [item[9:-1] for item in glob('./images/*/')]
train_faces, test_faces, train_targets, test_targets = train_test_split(faces, targets, test_size=0.3, random_state=0, stratify=targets)
# Load the list of images and categories
faces, targets = load_dataset('./images')
face_names = [item[9:-1] for item in glob('./images/*/')]
train_faces, test_faces, train_targets, test_targets = train_test_split(faces, targets, test_size=0.3, random_state=0, stratify=targets)
# Read the images as numpy arrays
train_tensors = paths_to_tensor(train_faces)
test_tensors = paths_to_tensor(test_faces)
# Extract features
train_features = get_features(feature_model, train_tensors)
test_features = get_features(feature_model, test_tensors)
print('Train features shape:', train_features.shape, '\nTest features shape:', test_features.shape)
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import make_scorer
model_log_base = LogisticRegression(random_state = 0, penalty='l1')
model_log_default = LogisticRegression(random_state = 0)
parameters = {'penalty':['l1', 'l2'], 'max_iter':[50, 100, 200, 500]}
scorer = make_scorer(fbeta_score, beta = 1.0, average='weighted')
grid_obj = GridSearchCV(model_log_base, parameters, scoring = scorer)
grid_fit = grid_obj.fit(train_features, train_targets)
model_log_best = grid_fit.best_estimator_
print('Base model is with L1 penalty and max iteration of 100.\n', model_log_base)
print('\nDefault model is with L2 penalty and max iteration of 100.\n', model_log_best)
print('\nOptimized model is with L2 penalty and max iteration of only 50.\n', model_log_best)
print('\nTime take for training and testing. Base, default and optimized.')
%time base_predictions = (model_log_base.fit(train_features, train_targets)).predict(test_features)
%time default_predictions = (model_log_default.fit(train_features, train_targets)).predict(test_features)
%time best_predictions = (model_log_best.fit(train_features, train_targets)).predict(test_features)
# Report the before-and-afterscores
print("\nBase model\n------")
print("F-score on testing data:", fbeta_score(test_targets, base_predictions, beta = 1.0, average='weighted'))
print("\nDefault Model\n------")
print("F-score on the testing data:", fbeta_score(test_targets, default_predictions, beta = 1.0, average='weighted'))
print("\nOptimized Model\n------")
print("Final F-score on the testing data:", fbeta_score(test_targets, best_predictions, beta = 1.0, average='weighted'))
The f1 score in this section cannot be used to compare with the scores from earlier sections because a different dataset split is used here. This section demonstrates the hyper-parameter tuning of the winning model - Logistic Regression. GridSearchCV was used to find the best combination of panelty and max_iter. It turns out that the default model already comes with the 'best' hyper-parameters except max_iter can be reduced from 100 to 50 to make it run faster without any sacrifice on accuracy.
train_faces, test_faces, train_targets, test_targets = train_test_split(faces, targets, test_size=0.15, random_state=1, stratify=targets)
train_faces, validate_faces, train_targets, validate_targets = train_test_split(train_faces, train_targets, test_size=0.2, random_state=1, stratify=train_targets)
print('There are %d training faces.' % len(train_faces))
print(count_each_category(categories, train_faces))
print('There are %d validate faces.' % len(validate_faces))
print(count_each_category(categories, validate_faces))
print('There are %d test faces.' % len(test_faces))
print(count_each_category(categories, test_faces))
Among all the 4 models, logistic regression is the one giving highest accuracy and acceptable training and testing time.
SGD classifier is the base of the benchmark and it reached 81% for f1 score. And it’s the fastest one on both training and testing. If time is a big concern, probably SGD classifier is the best in terms of both accuracy and time.
Logistic regression model has an average accuracy of almost 87% which pushed the boundary by 6%. The only cost is that logistic regression model takes longer time to train.
KNN doesn't perform so well in terms of accuracy. I guess it's due to the curse of dimensionality. And it’s slow on training, 40 times longer than SGD Classifier.
The DNN model in this project is exactly a transfer learning approach. Its accuracy is quite good, but it takes a long time to train and it's also costly. In this project I'm using GTX 1070 Ti, a GPU costs around USD 500. If the training is under the same CPU environment, I won't doubt it'll take at least 10 more times of current time. There is one thing very interesting that DNN is the only one who gained more than 10% improvement on the F1 score due to the technique of image generator. It is said that deep learning is taking over ‘traditional’ machine learning models when there is adequate data available and based on this project’s result it’s probably very true.
I will choose logistic regression as my final winning model in this project. It doesn’t show a perfect result (>95%) on the accuracy, especially there are only 7 categories in this project. I doubt it’ll work well in the real-world scenario that people need to perform face recognition among thousands to millions of faces. However, it still shows the effectiveness of feature engineering by leveraging transfer learning and applying ‘traditional’ machine learning models. In addition, the hyper-parameters were also explored, and it turned out the max_iter from logistic regression can be reduced from the default 100 to 50 without the sacrifice on accuracy. This will help improve the time spent for both training and testing.
One possible way to improve the model may be using transfer learning to detect the facial key points (centre and corners of eyes, mouth, nose etc.) first. Then build mathematic models to represent the relative positions of the key points. Normalization may be needed to make the face centred and ‘starring’ at the camera. Then the recognition problem will become to compute similarity of the key points map.
model_log_best.get_params
There are two family photos test_1.jpg and test_2.jpg in the root of working directory. Both of them were not used for training or testing in the earlier sections. Let's first perform a face detection using the process_faces().
process_faces('./test_1.jpg', DISPLAY, NOSAVE, 1.35, 5)
process_faces('./test_2.jpg', DISPLAY, NOSAVE, 1.32, 7)
len(test_features)
Now we can train a logistic regression model by using the ./images2 dataset.
faces, _ = load_dataset('./images')
generate_images(faces)
faces, targets = load_dataset('./images2')
all_train_tensors = paths_to_tensor(faces)
all_train_targets = targets
all_train_features = get_features(feature_model, all_train_tensors)
clf_log = LogisticRegression(random_state=0)
%time model_log = clf_log.fit(all_train_features, all_train_targets)
import pickle
pickle.dump(clf_log, open('clf_log', 'wb'))
clf_log = pickle.load(open('clf_log', 'rb'))
def get_name(model, image_face):
''' Use the model to make prediction and return the result as the name of the face.
Args:
model (sklearn.model): A trained model.
image_face (numpy.array): A face image in 3D numpy array.
Returns:
face_names[predict_idx] (str): The name of the face.
'''
# Conver the 3 channel RGB to 4-d tensor and normalize it
image_face_tensors = image_face.reshape(-1, 299, 299, 3)/255
image_face_features = get_features(feature_model, image_face_tensors)
predict_idx = model.predict(image_face_features)[0]
return face_names[predict_idx]
def process_faces_names(file_path, scaleFactor=1.3, minNeighb=5):
"""Process the input image file by extracting face(s). Draw bounding box and detected name.
Args:
file_path (str): The full path of the input image file.
scaleFactor (float): The scaling factor used by face detection function.
minNeighb (int): The number of minimum neighbors.
Returns:
None: Display the image.t
"""
print('Image path', file_path)
image = get_numpy_from_file(file_path)
# image = cv2.fastNlMeansDenoisingColored(image,None,5,5,7,15)
faces = get_faces(image, scaleFactor, minNeighb)
print('Number of faces detected:', len(faces))
image_with_detections, image_faces = draw_bounding_box(image, faces)
for i, (x,y,w,h) in enumerate(faces):
cur_face = cv2.resize(image_faces[i], (299,299))
name = get_name(model_log, cur_face)
# Write the returned name on the image
cv2.putText(image_with_detections, name,
(x,y-100),cv2.FONT_HERSHEY_SIMPLEX, 7, (255,0,0),20,cv2.LINE_AA)
display_from_numpy(image_with_detections, 12, 12)
process_faces_names('./test_1.jpg', 1.35, 5)
clf_log.get_params
process_faces_names('./test_2.jpg', 1.32, 7)
The result is not perfect, both photos have misclassified face(s). Not good, but not too bad, because we are family and we are born to look similar. Each photo has one face misclassified. Based on the earlier confusion matrix plot, there is no surprise on misclassifying my son and mum.